Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data

نویسندگان

  • Chen Suo
  • Stefano Calza
  • Agus Salim
  • Yudi Pawitan
چکیده

MOTIVATION RNA-sequencing technologies provide a powerful tool for expression analysis at gene and isoform level, but accurate estimation of isoform abundance is still a challenge. Standard assumption of uniform read intensity would yield biased estimates when the read intensity is in fact non-uniform. The problem is that, without strong assumptions, the read intensity pattern is not identifiable from data observed in a single sample. RESULTS We develop a joint statistical model that accounts for non-uniform isoform-specific read distribution and gene isoform expression estimation. The main challenge is in dealing with the large number of isoform-specific read distributions, which potentially are as many as the number of splice variants in the genome. A statistical regularization via a smoothing penalty is imposed to control the estimation. Also, for identifiability reasons, the method uses information across samples from the same region. We develop a fast and robust computational procedure based on the iterated-weighted least-squares algorithm, and apply it to simulated data and two real RNA-Seq datasets with reverse transcription-polymerase chain reaction validation. Empirical tests show that our model performs better than existing methods in terms of increasing precision in isoform-level estimation. AVAILABILITY AND IMPLEMENTATION We have implemented our method in an R package called Sequgio as a pipeline for fast processing of RNA-Seq data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimatio...

متن کامل

Msiq: Joint Modeling of Multiple Rna-seq Samples for Accurate Isoform Quantification by Wei

Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a highthroughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challengi...

متن کامل

P-121: Cloning and Expression of The Inosine Triphosphate Pyrophosphatase Gene Variant II in E.coli

Background Environmental and cellular inappropriate conditions can cause damages to cells nucleotide poll. Deamination and oxidation damages interfere with cell�s vital reactions. Inosine triphosphate pyrophosphatase (ITPA), an evolutionary conserved enzyme, plays a critical role in elimination of non-canonical bases. In human genome, the ITPA gene is located on chromosome 20 short arm and tran...

متن کامل

eQTL Mapping Using RNA-seq Data.

As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it gener...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 4  شماره 

صفحات  -

تاریخ انتشار 2014